---
title: AnyText API Usage Guide
slug: 0tF--any
createdAt: Tue Jul 30 2024 05:31:14 GMT+0000 (Coordinated Universal Time)
updatedAt: Wed Jul 31 2024 09:01:31 GMT+0000 (Coordinated Universal Time)
---

# AnyText API Usage Guide

## Introduction

This document will guide developers on how to use the `aonet` library to invoke the AnyText API, which is used for text generation and editing before rendering in images(Text).

## Prerequisites

- Node.js environment
- `aonweb` library installed
- Valid Aonet APPID

## Installation

Ensure the `aonet` library is installed. If not, you can install it using npm:

```bash
npm install aonet
```

## Usage Instructions

### 1. Import the `aonet` Library

```javascript
const AI = require("aonet");
```

### 2. Configure Options

Create an `options` object containing your APPID:

```javascript
const options = {
    appid: "your_APPID"
};
```

Make sure to replace `"your_APPID"` with your actual Aonet APPID.

### 3. Initialize AI Instance

Initialize the AI instance using the configuration options:

```javascript
const aonet = new AI(options);
```

### 4. Invoke AnyText API

Use the `prediction` method to call the FunASR API:

```javascript
async function performSpeechRecognition() {
    try {
        let response = await aonet.prediction("/predictions/ai/anytext", {
            input: {
                "mode": "text-generation",
                "prompt": "photo of caramel macchiato coffee on the table, top-down perspective, with \"Any\" \"Text\" written on it using cream",
                "seed": 200,
                "draw_pos": "https://replicate.delivery/pbxt/LIHKXdjxOWFe7HqP1rliIsghRab48EVQRzGNwQ9RgyO5V03d/gen9.png",
                "ori_image": "https://replicate.delivery/pbxt/LIHMZ8cCvmndHNVufiSuKZA4mnokuSOy87cYqhvs4Diei7sL/edit9.png",
                "img_count": 2,
                "ddim_steps": 20,
                "use_fp32": false,
                "no_translator": false,
                "strength": 1,
                "img_width": 512,
                "img_height": 512,
                "cfg_scale": 9,
                "a_prompt": "best quality, extremely detailed,4k, HD, supper legible text, clear text edges, clear strokes, neat writing, no watermarks",
                "n_prompt": "low-res, bad anatomy, extra digit, fewer digits, cropped, worst quality, low quality, watermark, unreadable text, messy words, distorted text, disorganized writing, advertising picture",
                "sort_radio": "↕",
                "revise_pos": false
            }
        });
        console.log("AnyText result:", response);
    } catch (error) {
        console.error("Error performing speech recognition:", error);
    }
}

performSpeechRecognition();
```

### Parameter Description

- `mode`: str, Indicates a model that needs to be called, fixed value.
- `prompt`: str, Tips, describe the content of the image
- `seed`: int,The number of seeds, range -1 \~ 99999999.
- `draw_pos`: url,A URL address of an image indicating the positions of the generated text.&#x20;
- `ori_image`: url,A URL address to be edited.
- `img_count`: int,Number of images to generate, range 1–16.
- `ddim_steps`: int,The number of sampling steps, must be within the range of 1 to 100.​
- `use_fp32`: bool,
- `no_translator`: bool,No Translator
- `strength`: float,The control strength of the text control module, must be within the range of 0.0 to 2.0.​
- `img_width`: int,Image width, valid only in “text generation” mode, must be within the range of 256px to 768px.​
- `img_height`: int,Image height, valid only in “text generation” mode, must be within the range of 256px to 768px.
- `cfg_scale`: float,Classifier-Free Guidance (CFG) strength parameter, range 0.1–30.0.
- `a_prompt`: str,Additional prompt words, typically used to enhance the image effect.
- `n_prompt`: str,Negative prompt words.
- `sort_radio`: str,Sort Position,position sorting priority
- `revise_pos`: bool,Revise Position

## Considerations

- Ensure the provided image URL is publicly accessible and of good quality for optimal recognition results.
- The API may take some time to process the image and generate results,
- Handle potential errors, such as network issues, invalid input, or API limitations.
- Adhere to terms of use and privacy regulations, especially when processing image containing sensitive information.
- Enter the descriptive prompts (supporting both Chinese and English) in the Prompt. Each line of text to be generated should be enclosed in double quotes, then hand-draw the position for each line of text in sequence to generate an image. The quality of the generated image depends critically on the drawing of the text positions, so please do not draw them too casually or too small. The number of positions must match the number of text lines, and each position’s size should match the length or height of the corresponding text line as closely as possible.

## Example Response

The API response will include the image content after text generation or text editing. Parse and use the response data according to the actual API documentation.

## Advanced Usage

- Implement batch image processing by processing multiple image files in a loop or concurrently.
- Add a user interface to allow users to upload their image files or provide image URLs.
- Implement real-time tex recognition by integrating the API into live image streams.
- Integrate post-processing features for text, such as punctuation addition, semantic analysis, or sentiment analysis.
- Consider implementing multi-language support to handle image in different languages as needed.

By following this guide, you should be able to effectively use the AnyText API for automatic speech recognition in your applications. If you have any questions or need further clarification, feel free to ask.
